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Abstract 

Under the paradigm of caching, partial data is delivered before the actual requests of users are 
known. In this paper, this problem is modeled as a canonical distributed source coding problem with 
side information, where the side information represents the users’ requests. For the single-user case, a 
single-letter characterization of the optimal rate region is established, and for several important special 
cases, closed-form solutions are given, including the scenario of uniformly distributed user requests. In 
this case, it is shown that the optimal caching strategy is closely related to total correlation and Wyner’s 
common information. Using the insight gained from the single-user case, three two-user scenarios 
admitting single-letter characterization are considered, which draw connections to existing source coding 
problems in the literature: the Gray-Wyner system and distributed successive refinement. Finally, the 
model studied by Maddah-Ali and Niesen is rephrased to make a comparison with the considered 
information-theoretic model. Although the two caching models have a similar behavior for the single- 
user case, it is shown through a two-user example that the two caching models behave differently in 
general. 


Index Terms 

Coded caching, function computation, Gray-Wyner system, multi-terminal source coding, source 
coding with side information, successive refinement. 

I. Introduction 

Consider a sports event filmed simultaneously by many eameras. After the game, a sports 
afieionado would like to wateh a eustomized video sequenee on his mobile deviee that shows 
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him, in every moment, the best angle on his favorite player in the game. Of eourse, he would 
like that video as soon as possible. To meet sueh demand, the provider eould ehoose to use the 
paradigm of caehing: even before knowing the preeise camera angles of interest, cleverly coded 
partial data is delivered to the end user device. If all goes well, that partial data is at least partially 
useful, and hence, at delivery time, a much smaller amount of data needs to be downloaded, 
leading to faster (and possibly cheaper) service. To make matters even more interesting, there 
might be several users in the same mobile cell with the same wish, except that they probably 
have different favorite players. Now, the caching technique could be employed at the base station 
of the mobile cell, and the goal is to design cache contents such that at delivery time, almost 
all users experience a faster download speed. 

In the present paper, we model this situation in a canonical information-theoretic fashion. We 
model the data as a long sequence Xi, X2, X3, ■ ■ ■ , where the subscript may represent the time 
index. That is, in the above example, X* would represent the full collection of video images 
acquired at time i. Furthermore, in our model, the user defines a separate request for each time 
instant i. Hence, the user’s requests are also modeled as a sequence Yi, I2, X3, • • ■ , whose length 
we assume to be identical to the length of the data sequence. That is, in the above example, Y* 
represents the user’s desired camera angle at time i. There are two encoders: The cache encoder 
and the update encoder. The cache encoder only gets to observe the data sequence, and encodes 
it using an average rate of Rc bits per symbol. The update (or data delivery) encoder gets to 
observe both the data sequence and the request sequence, and encodes them jointly using an 
average rate of Ru bits per symbol. At the decoding end, to model the user’s desired data, we 
consider a per-letter function g{Xi, Yi) which needs to be recovered losslessly for all i. For 
example, we may think of Xj as being a vector of a certain length, and of Yi as the index of the 
component of interest to the user at time i. Then, (/(Xj, Y^) would simply return the component 
indexed by Yi from the vector Xj. The goal of this paper is to characterize the set of those rate 
pairs (i?c, .Ru) that are sufficient to enable the user to attain his/her goal of perfectly recovering 
the entire sequence g{Xi, Yi), ^((Xs, Y2), ^((Xg, Y3), • ■ ■ . 

When there are multiple users, normally different end users have distinct requests and func¬ 
tions, denoted by Y^^'^ and ge{-.,-), respectively, for end user i. Moreover, different end users 
may share caches and updates. However, in this work we make the simplification that each 
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user’s request is known to all userso That is, denoting by YJ = the collection of 

requests, we assume that Yi is globally known except to the cache encoder. Thus, we denote 
fi{Xi,Yi) = gi{Xi,Y^^^) and focus on the functions •)} hereafter. 

An important inspiration for the work presented here are the pioneering papers of Maddah-Ali 
and Niesen [[Il> [I 3 - Their work emphasizes the case when there is a large number of users and 
develops clever caching strategies that centrally leverage a particular multi-user advantage: Cache 
contents is designed in such a way that each update (or delivery) message is simultaneously useful 
for as many users as possible. Our present work places the emphasis on the statistical properties 
of the data and requests, exploiting these features in a standard information-theoretic fashion by 
coding over multiple instances. 

On the modeling side, one could relate the Maddah-Ali-Niesen model to our model in the 
following manner: Consider a single data X and a single request Y. This X is composed of 
N files, each containing F bits, and the Y designates the index of the desired file. Then, the 
considered information-theoretic model corresponds to coding over multiple instances of (X, Y), 
assuming F is a fixed constant. On the other hand, the Maddah-Ali-Niesen model corresponds to 
coding over a single instance of (X, Y) but the file size F can be arbitrarily large. In this paper, 
the Maddah-Ali-Niesen model will also be referred to as the “static request” model. Ample 
results are available for the static request model at this point: the worst-case analysis [|T1|, the 
average-case analysis lO, decentralized O, delay-sensitive flUl, online Q, multiple layers [0, 
request of multiple items (Tl, secure delivery [l8]|, wireless networks dUl, lITOll . etc. In addition, 
some improved order-optimal results for the average case can be found in ifTTll . [[T2l . 

The Maddah-Ali-Niesen model fits well with applications in which the users’ requests remain 
fixed over the entire time period of interest, e.g., on-demand video streaming of a movie from 
a given database. By contrast, our model may be an interesting fit for applications in which the 
users’ requests change over time, such as the multi-view video system example in the beginning. 
Furthermore, our model fits well with sensor network applications. In many cases, only the 
sensor data (modeled as X) with certain properties (modeled as Y) are of interest and the 
desired properties may vary over a timescale of minutes, hours, or even days. In terms of coding 
strategies, we also take a different approach from |[T], |[2l and the follow-up works, in which 

*In practice, this can be realized by the server broadcasting requests to all users. Comparing with the desired data itself, 
usually the amount of information contained in the requests is much less and it is even more so when the request distribution 
is far from uniform. Then, the penalty due to the overhead of revealing requests will be acceptable for some applications. 
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Fig. 1. The single-user case. 



Fig. 2. The Gray-Wyner system in the considered setup. 


the core schemes are based on linear network coding. By contrast, by formulating the caching 
problem into a multi-terminal source coding problem, our main tools are standard information- 
theoretic arguments, including joint typicality encoding/decoding, superposition coding, and 
binning. 

A. Related Works 

The structure of many source coding problems studied in the literature can be captured by our 
formulation. Let us start with the single-user case, as shown in Figure [B Denote by Rc and Ru 
as the rates of the cache and the update, respectively. Depending on the availability of the cache 
and the update, each configuration can be seen as a special case or a straightforward extension 
of 

1) (0,i?u): lossless source coding with side information lfT3l : 

2) (i^ciO): lossy source coding with side information lfT4ll or lossless coding for computing 
with side information ifTSl : 

3) {Rc,Ru)' lossless source coding with a helper lUbll . ifTTl . 

As for the two-user case, two classes of source coding problems are related to our problem 
setup: the Gray-Wyner system and the problem of successive refinement. 
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Fig. 3. The problem of successive refinement in the considered setup. 


In the Gray-Wyner system, eaeh deeoder reeeives a private message and a eommon message. A 
system diagram of the eonsidered setup is shown in Figure [ 2 l The optimal rate-distortion region 
was charaeterized in [fT^ . The extension to inelude distinet side information at the decoders was 
studied in [fT^ and [ 1 ^ Section V]. 

In the problem of successive refinement [|2T]| . Il22]| . one of the decoders has no private message, 
i.e., all of its received messages are also available at the other decoder. A system diagram in 
the considered setup is shown in Figure [H The optimal rate-distortion region was characterized 
in [| 23 ]| . The extension to include distinct side information at the decoders was studied in flM - 
If 26 l . The problem of successive refinement can also be extended to multiple sources ifITIl . 

One special case of the multi-source extension is the problem of sequential coding of correlated 
sources lf29ll . 

B. Summary of Results 

In this work we make some progress for the single-user case and extend the results to some 
two-user scenarios. For the single-user case, the results are summarized as follows: 

• Theorem [U provides a single-letter characterization of the optimal rate region. 

• Propositions [T] and [2] give the exact optimal rate regions for the cases of independent compo¬ 
nents and nested components, respectively, and confirm that some intuitive caching strategies 
are indeed optimal. 

• Proposition [ 3 ] shows that if the components are uniformly requested, then the optimal caching 
strategy is to cache a description of the data that minimizes the conditional total correlation. 

For the two-user case, we find single-letter expressions for the following scenarios: 

• the private-update-aided Gray-Wyner system (Theorem O; 

• the common-cache-aided Gray-Wyner system (Theorem ( 3 ]); 
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• the problem of sequential sueeessive refinement (Theorem Hj). 

In addition, we show that the information-theoretie model and the average-ease formulation of 
the Maddah-Ali-Niesen model have the same sealing behavior for the single-user ease. However, 
through a two-user example, we show that in general the two eaehing models behave differently 
and eoding over multiple bloeks ean be benefieial. 

The paper is organized as follows. In Seetion |nl we provide the problem formulation for the 
general two-user ease. Seetion |nl] is devoted to the single-user ease. In Seetion we eonsider 
several two-user scenarios which are extensions of the Gray-Wyner system and the problem of 
successive refinement. In Section |V] we go back to the single-user case and discuss the static 
request model. Finally, we conclude in Section|^ The lengthy proofs are deferred to appendices. 

C. Notations 

Random variables and their realizations are represented by uppercase letters (e.g., X) and low¬ 
ercase letters (e.g., x), respectively. We use calligraphic symbols (e.g., X) to denote sets. Denote 
by I ■ I the cardinality of a set. We denote [a] := {1, 2, • • • , [aj} for all a > 1 and A\B := {x G 
^|x i B}. We denote X" := (Xi, X 2 , • • • , X^) and XW\{*> := (Xi, • • • , X,_i, X^+i, • • • , X^). 
Throughout the paper, all logarithms are to base two. Let /ib(p) := —plog(p) — (1 —p) log(l —p) 
for p G [0,1] and Olog(O) := 0 by convention. We denote (x)+ := max{x, 0} and follow the 
e-5 notation in [[30ll . 


II. Problem statement 

For convenience, we provide a problem statement for the general two-user case and then restrict 
attention to four special cases of interest. Consider a discrete memoryless source (DMS) (X, Y) 
with finite alphabet X x y and joint probability mass function (pmf) px,Y- The DMS (X, Y) 
generates an independent and identically distributed (i.i.d.) random process ((Xj,Fj) : i G Z+). 

Denote by fc a positive integer. There are two encoding terminals and two decoding terminals. 
The cache encoder observes the source sequence X^, the update encoder observes the source 
sequences (X^,F^), and both decoders observe Y^. Decoder ^ G {1,2} wishes to recover an 
element-wise function fi{x,y) losslessly. The cache encoder generates three messages 
Me,{2}, and Me,{1,2} of rate Rc,{i}, Rc,{2}, and i?c,{i,2}, respectively. Similarly, the update encoder 
generates three messages Mu,|i}, Mu,{2}, and Mu,{1,2} of rate Ru,{i}, Ru,{2}, and i?u,{i,2}, respec¬ 
tively. We note that some of these rates may be zero. Then, Decoder i G {1,2} receives the 
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set of messages (Mc,{£}, Mc,{i, 2 }, Afu,{i, 2 })- The messages Mc,{£} and Mu,{£} are ealled 

private cache content and private update content of User ^ e {1,2}, respeetively. Besides, the 
messages and 2 } are called common cache content and common update content, 

respectively. 

For convenience, we denote 

R- := (-Rc,{l}, -Rc,{2}, Rc,{1,2}, -Ru,{1}, -Ru,{2}, Ru,{1,2}), 

2A:R 2*'^=.{2} 2^'^‘^’{i.2} 2^'^'j-{i} 2^^''’{2} 2^'^'j>ti,2}^ 

Then, a (2^^, fc) distributed multiple description code consists of {A G {{1}, {2}, {1, 2}}) 

• two encoders, where the cache encoder assigns three indices G [2^^^ -^] to each 

sequence G and the update encoder assigns three indices 7 /^) G [2^^'' -^] to 

each pair of sequences G x y^-, 

• two decoders, where Decoder ^ G {1,2} assigns an estimate s\ to each tuple 

(^c,{£}, '^c,{l,2}, '^u,{l}i ^u,{l,2}, y )• 

A rate tuple R is said to be achievable if there exists a sequence of (2^^, k) codes with 



Note that the probability is taken with respect to both and Y^. The optimal rate region 
is the closure of the set of achievable rate tuples. 

In this paper, we only consider the following projections of in which some of the rate 
components are set to zero: 

1) the single-user case 

TV := {(Rc,{i},-Ru,{i}) : (-Rc,{i}, 0, 0, Ru,{i}, 0, 0) G 7?.}^||}; 

2) private-update-aided Gray-Wyner system 

^puGW := {(Rc,{l}, -Rc,{2},-Rc,{1,2}, -Ru,{l}, -Ru,{2}) : 

(Rc,{l}, -Rc,{2}, Rc,{1,2}, Ru,{l}, -Ru,{2}, 0) G 7^}u||}; 

^More results on the general two-user case can be found in 1311 Chapter 4], including a full achievable rate region. 


3) common-cache-aided Gray-Wyner system 


^ccGW := {(-Rc,{1,2},-Ru,{1},-Ru,{2},-Ru,{1,2}) : 

(0, 0, -Rc,{1,2}, Ru,{1}, Ru,{2}, Ru,{1,2}) € "^full}; 

4) sequential sueeessive refinement 

^SSR {{Rc,{2}, Rc,{1,2}, Ru,{2}, Ru,{1,2}) ■ 

(0, Rc,{2}, Rc,{1,2}, 0, Ru,{2}, -Ru,{ 1,2}) ^ 7^fu||}- 

With an abuse of notation, members of different projections are simply denoted by R and 
its dimension will be clear from context. For the single-user case, all the decoder indices are 
dropped, e.g., fi{x,y) is simply denoted by f{x,y). 

III. The Single-User Case 

In this seetion, we present the main results for the single-user ease. This setup is a speeial 
ease of the problem of lossy souree eoding with a helper (and deeoder side information), whieh 
remains open in general. We establish the optimal rate region of the eonsidered setup in the 
following theorem. The proof is deferred to Appendix |Al 

Theorem 1: Consider the single-user eaehing problem. The optimal rate region TZ* is the set 
of rate pairs (i?c, Ru) sueh that 

R,>I{X-V\Y), (1) 

R, >R(/(X,F)|U,F), (2) 

for some conditional pmf pv\x with |V| < \X\ + 1. 

Due to the faet that the update eneoder is more informative than the eaehe eneoder, we 
have the following corollary from Theorem [IJ We denote R* := min{i?u : (0,i?u) ^ R*} and 
R* := min{i?c : (Re, 0) G R*}. The proof is deferred to Appendix |Bl 

Corollary 1: Consider the single-user eaehing problem. All the following statements hold: 

1) R: = H{f{X,Y)\Y). 

2) R* = min J(X; UlF), where the minimum is over all eonditional pmfs pv\x satisfying 
H{f{X,Y)\V,Y) = 0. 

3) Rc + Ru> H{f{X, Y)\Y) for all (Rc, Ru) e R*. 

4) R* < R*. 
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5) If (i?c, Ru) e R*, then for all a G [0, i?c], {Rc — a, Ru + a) G TZ*. 

6) If {Rc, Ru) is an extreme point of TZ* with Rc > 0, then for all a > 0, {Rc + a,Ru — a) ^ TZ*. 
Note that Statement 2 of Corollary [T] reeovers the result of lossless coding for computing with 
side information m- Statements 5 and 6 point out in which direction we can move a partial 
rate such that the resulting rate pair still resides in 7Z*. 

In general, the optimal sum rate can only be achieved with {R^ Ru) = {0, H{f{X,Y)\Y)), 
i.e., the update encoder does all the work. Namely, in general there is a penalty when the request 
F*' is not known at the encoder. Nevertheless, for the class of partially invertible functions, one 
can arbitrarily distribute the work load without compromising the sum rate. 

Corollary 2: If the function / is partially invertible, i.e., H{X\f{X,Y),Y) = 0, then R* = 
Rl = H{X\Y). 

Proof: First, Corollary [T] says that H{f{X,Y)\Y) = R* < R* < H{X\Y), where the last 
inequality follows by setting V = X in ([I]). Then, the corollary follows immediately by noting 
that 

H{f{X,Y)\Y) = H{f{X,Y),X\Y) = H{X\Y), 

where the first equality follows since H{X\f{X,Y),Y) = 0, by assumption. ■ 

In other words. Corollary [2] says that for partially invertible functions, e.g., arithmetic sum and 
modulo sum, the side information Y at the update encoder is useless in lowering the compression 
rate and thus in this case the cache encoder is as powerful as the update encoder. More generally, 
it can be shown that R* = R* if and only if there exists a conditional pmf pv\x such that 

1) H{V\f{X,Y),Y) = H{V\X,Y),md 

2) II{f{X,Y)iV,Y) = 0. 

For most single-user caching problems, it is challenging to find a closed-form expression for 
the optimal rate region 7Z*. In words, we do not know the optimal caching strategy in general. 
In the following, we consider three cases where X and Y are independent, which implies that 
I{X; V\Y) = I{X; V). For the first two cases, we are able to show that some intuitive caching 
strategies are indeed optimal. In the last case, we provide some guidance for the optimal caching 
strategy. Without loss of generality, we assume that y = [N]. Besides, we will find it convenient 
to denote := f{x,y). 
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Remark 1: \f X and Y are independent, then the optimal eonditional pmf Py]^x ^ fixed 
eaehe rate ean be found by solving the following eonstrained optimization problem 

maximize I{f{X,Y),Y-,V) 

subjeet to I{X-,V) < Rc 

over all eonditional pmf pv\x with |V| < \X\ + 1. To see this, we observe that 

argmini/(/(X,y)|l/,F) = 8 .Tgm8.xH{f{X,Y)\Y) - H{f{X,Y)\V,Y) 

Pv\x Pv\x 

= argmax/(/(X, F); 1/|F) 

Pv\x 

= argmaxI(f(X,Y),Y;V), 

Pvfx 

where (a) follows sinee X and Y are independent, by assumption, and V —o— X —o— Y form 
a Markov ehain. Thus, eaohing has an information bottleneck interpretation [l32ll (see also (331). 
Given a fixed-size eaehe as bottleneek, we aim to provide the most relevant information of 
the desired funetion f{X, Y). The existing algorithms developed for the information bottleneek 
problem are applieable to numerieally approximate the optimal rate region TZ*. 0 

A. Independent Source Components 

In this subsection, we consider the case where the source components • • • ,X(^^ are 
independent. Note that we used the short-hand notation X^'^'> = f{X,n). Without loss of 
generality, we assume that py(l) > py(2) > ••• > py(X). Then, we have the following 
proposition. 

Proposition 1: If X and Y are independent and the source components X^^V • • ,X^^^ are 
independent, then the optimal rate region TZ* is the set of rate pairs {R^ Ru) such that 

Rc > r, 

N / n \ + 

Ru > Y.^PY{n) - pY{n + I))ij2 ^ ’ ( 3 ) 

n=l \j=l J 

for some r > 0, where Py{N -f 1) = 0. 

When relating to the motivating example in Introduction, Proposition [T| indicates that the best 
caching strategy for independent views is to cache the most popular ones, no matter how different 
the video qualities are (see also O and the references therein). More generally, when the user 


11 


wants to retrieve multiple views at the same time, eaehing the most popular ones remains optimal. 
The details ean be found in [|^ Theorem 6]. 

Proof: Here we prove the aehievability part. The eonverse part is deferred to Appendix O 
Note that dU) is equivalent to saying that 

1) if ^ > En=i then > 0, and 

2) if YTjZi H <r< f*^^ some n G [A^], then 

( n \ ^ 

E H{X^^^) - r py(j)ff(A'0>). 

j=l J j=n+l 

Therefore, for all n G [0 : iV], setting V = in dU) and d2l) shows that the rate 

pair 

( n N 

j=l i=n+l 

is aehievable, whieh eorresponds to a eorner point of the region deseribed hy R^> r and dl]). 
Sinee the rest of points on the boundary ean be aehieved by memory sharing, the proposition is 
established. ■ 

B. Nested Source Components 

Again using the shorthand notation = f{X,n), in this subseetion we assume that 
= 0 for all n G [X - 1]. Then, we have the following proposition. 

Proposition 2: If X and Y are independent and = 0 for all n G [X — 1], 

then the optimal rate region TZ* is the set of rate pairs (i?c, Ru) sueh that 

Rc > r, 

N 

(4) 

n=l 

for some r > 0. 

If we think of - as representations of the same view but with different levels of 

quality, then Proposition 2 indieates that the best eaehing strategy is to eaehe the finest version 
that still fits into the eaehe. 

Proof: Here we prove the aehievability part. The eonverse part is deferred to Appendix |Dl 
Note that dS) is equivalent to saying that 
1) if r > X(XW), then R^ > 0, and 
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2) if H{X^^ < r < for some j G [N], where H{X^^^) := 0, then 

N 

Ru>J2pY{n) (if(X(-))-r). 

n=j 


Therefore, for all n G [0 : iV], substituting V = X^'^'> in ([I]) and (l2l) shows that the rate pair 


(i?c, Ru) = 


/ N '' 

^ py(j)J/(A'0>|A'<”>) 




j=n+l 


is achievable, which corresponds to a corner point of the region described hy R^> r and dH). 
Since the rest of points on the boundary can be achieved by memory sharing, the proposition is 
established. ■ 


C. Arbitrarily Correlated Components with Uniform Requests 

Here we assume that the request is uniformly distributed, i.e., py(n) = for all n G [N], but 
X^^\ ■ ■ ■ can be arbitrarily correlated. Recall that = f{X,n). Although we cannot 

give a closed-form expression of the optimal rate region, we provide a necessary and sufficient 
condition on the auxiliary random variable which characterizes the boundary of the optimal rate 
region. The proof is deferred to Appendix El 

Proposition 3: If X and Y are independent and prin) = ^ for all n G [N], then all points 
{Rc, Ru) on the boundary of the optimal rate region TZ* can be expressed as 


Rc = r, 


N 


( 


H{X)-rA min r(X|l/) 

'Pv\X 

s.t. l(X-,V)=r 


for some r e [0, H(X)], where A := (A(‘>, A<2>, ■ ■ ■ , A*"') and 


r(A|lf) := 


N 


^H{X^'^^\V) 


n=l 




If = 2, we have 




so the term can be interpreted as a generalization of conditional mutual 


was 


information. In fact, the term • • • ,X^^'>) = — H{X^^\ ■ ■ ■ ,X^^'>) 

first studied by Watanabe OSlI and given the name total correlation. Following this convention, 
we refer to ,X^^"'\V) as conditional total correlation. Proposition [3] indicates that 
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an optimal caching strategy is to cache a deseription of the data that minimizes the eonditional 
total eorrelation. 

When the caehe rate is large enough, there exists a conditional pmf pv\x such that the 
eonditional total eorrelation is zero and thus we have the following eorollary. 

Corollary 3: The boundary of the region {{Rc, Ru) G 7?.*|i?crit < -Rc < H{X)} is a straight 
line i?c + NRu = H{X), where 

i?crit = min I{X]V). 

Pv\x 

s.t. r(x|y)=o 

Note that when N = 2, Rent is Wyner’s common information [|^ . 

Finally, let us consider an example whieh covers all three mentioned cases. 

Example 1: Fix q G [0, |]. Consider Y ~ Uniform({l, 2}) and X = where 

(Xis a DSBS(g). Assume that X and Y are independent. We first eonsider two extreme 
cases. 

1) If q = 1/2, then the two components are independent and TZ* = {{Rc,Ru) ■ Rc > 0, i?u > 

0,Re + 2R, > 2}. 

2) If g = 0, then the two eomponents are nested and TZ* = {{Rc, i?u) : -Rc > 0, i?u > 0, Rc + 
Ru > 1}. 

Now eonsider 0 < g < |. Wyner’s eommon information of is known as If^ 

Rcrit = 1 + hb{q) - 2hh{q'), 

where g' = |(1 — \/l — 2g). Thus, from Corollary |3l we have 

min{/2u : Rc > Rent, {Rc, Ru) G = ^(1 + K{q) - Rc). 

Note that Ru > |(1 + hb{q) — Rc)^ is also a valid lower bound for all Rc > 0. Besides, from 
Statement 3 of Corollary \\\ we have i?c + -Ru > 1- 

As for the ease 0 < g < | and Rc < Rent, we do not have a eomplete eharaeterization. Let 
us eonsider the following ehoice of the auxiliary random variable V. We set 


V 


x(R©f/ ifx(R = x(2), 

w ifx(R^x(2), 


(5) 
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Fig. 4. Inner bounds and an outer bound for Example [T] Here g = 0.1. 


where 0 denotes modulo-two sum, U^W G {0,1} are independent of (X, Y), and furthermore 
W ~ Bemoulli(l/2). Numerical studies suggest that such choice of V characterizes the boundary 
of TZ* for < -Rcrit- It can be checked that setting 

1 

= 2 - 200 ^ 


achieves Wyner’s common information i?crit- 

In Figure |4] we plot three inner bounds and an outer bound for the case q = 0.1, where 
i?crit ~ 0.873. The first inner bound is plotted in green dot, which results from memory sharing 
between the extreme points (i?*, 0) and (0, i?*). The extreme point (i?crit, |(1 + hb{<l) — -Rcrit)) 
is marked by a blue diamond point. Then, the second inner bound is formed by memory sharing 
among the three extreme points. The third inner bound has the same boundary as the second 
inner bound for > Rent- As for R^ < Rcr\t, the third inner bound is plotted in red solid, 
which results from evaluating all G [7,0.5]. Finally, the combined outer bound R^ > 

max{I(l 0 hb(g) — (1 — -Re)’*’} is plotted in black solid. 

We remark that it can be shown that 0 K) = H{X^'^'>\V), n G {1,2}. Thus, instead 

of Slepian-Wolf coding, the update encoder can simply compress X*^”) 0 and transmit. Then, 
after recovering X^”^ 0 V, the decoder removes V to get the desired component X^"^ 0 
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IV. The Two-User Extensions 
A. Sequential Coding over the Gray-Wyner Systems 

Figures [5] and [6] depict the private-update-aided Gray-Wyner system and the common-cache- 
aided Gray-Wyner system, respectively. In the private-update-aided Gray-Wyner system, the 
cache encoder plays the role of encoder in the Gray-Wyner system and is aided by the update 
encoder through the private updates. By contrast, in the common-cache-aided Gray-Wyner 
system, the update encoder plays the role of encoder in the Gray-Wyner system and is aided by 
the cache encoder through the common cache. By applying sequential coding over the Gray- 
Wyner systems, we have the single-letter characterization for the considered setups. The proofs 
follow similar lines as in the single-user case and thus are omitted. 

Theorem 2: Consider the private-update-aided Gray-Wyner system. The optimal rate region 
^puGw is the set of rate tuples R such that 

Rc,{i, 2} >/(X;Uc|y), 

Rc,{i}>/(X;Ui|Uc,V), 

Rc,{ 2 }>/(X;U 2 |K,>^), 

Ru,{i}>R(/i(X,V)|Uc,Ui,V), 

Ru,{2}>H{h{X,Y)\V,,V2,Y), 

for some conditional pmf PVc\xPvi\Vc,xPv 2 \Vc,x satisfying |Vc| < lA”! +4, \Vj\ < |Vc||A| -f 1, 

Theorem 3: Consider the common-cache-aided Gray-Wyner system. The optimal rate region 
TZ*cg\n is the set of rate tuples R such that 

Rc,{i, 2} >/(X;Uc|y), 

Ru,{i, 2} >/(X;K|Uc,V), 

Ru,{i}>H{MX,Y)\V,,V,,Y), 

Ru,{2} >R(/2(A,V)|K,K,V), 

for some conditional pmf PVc\xPv^\Vc,x,Y satisfying |Vc| < |A| -f 3, |Vu| < |Vc||A||y| -f 2. 
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Fig. 5. The private-update-aided Gray-Wyner system. 
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Fig. 6. The common-cache-aided Gray-Wyner system. 
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Fig. 7. The problem of sequential successive refinement. 


B. Sequential Successive Refinement 

Figure |7] plots the problem of sequential sueeessive refinement, whieh ean be seen as a speeial 
ease of the distributed sueeessive refinement problem. In the first stage, the eaehe eneoder and 
the private eneoder eaeh send a eoarse deseription to both deeoders, and then in the seeond stage 
they eaeh send a refined deseription only to Deeoder 2. For this setup, we have a single-letter 
eharaeterization for the optimal rate region given in the following theorem. 

Theorem 4: Consider the problem of sequential sueeessive refinement. The optimal rate region 
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TZssr the set of rate tuples R sueh that 

Rc.{i,2 } >/(X;K|l^), 

fic,{i,2 } + Rc,{2 } > /(X; V,\Y) + J(X; V^X, Y), K, Y), 
Ru,{i,2}>HiMX,Y)\V,,Y), 

Ru,{i,2 } + Ru,{2 } > H{MX, F)|K, V') + HiMX, F)|1/2, fi{X, Y), K, V'), 

for some conditional pmf Pv 2 yc\x satisfying |Vc| < \X\ +3 and IV 2 I < iVcIlA'I + l. 

Proof: The converse part is deferred to Appendix |B The achievability can be proved by 
applying Theorem [U and its straightforward extension. Here we provide a high-level description. 
Consider a simple two-stage coding. In the first stage, we use a multiple description code which 
follows the achievability for the single-user case and each encoder sends its generated message 
through its common link. Since both messages (Me,{ 1 , 2 }, Me,{ 2 }) are also received by Decoder 
2, Decoder 2 can also learn (Wc > l/i)}*e[fc])- Then, in the second stage, we use another 

multiple description code which follows the achievability for the single-user case but with the 
augmented side information |/j)}jg[fc], 7 /^). Once the messages are generated, each 

encoder can divide its message of the second stage into two parts, one of which is sent through 
the common link and the other is sent through the private link. ■ 

V. The Static Request Model 

Consider a database modeled by a DMS (X), which generates an i.i.d. source sequence 
X^^, where k and F are positive integers. The request is modeled by a DMS (Y), which 
generates an i.i.d. sequence Here we assume that the request sequence is independent 
of the database X^^. For all i e [k], we say that the Fth block consists of the subsequence 
X(j_i)i7’+2, • • • ,Xip) and the request F*. We assume that the user is interested in 
recovering f{X(i_i)F+j,Yi) for all i G [k] and j G [F]. Then, the information-theoretic model 
corresponds to the case where F = 1 and thus each Xi is paired with a distinct Fj, z G [k]. 
Alternatively, one can think of it as processing 

Xjj Xppjj A^2 E+a X^ppjy * * * 5 X(^k—i)F-\-j 


for different j G [F] separately. We will refer to it as coding across blocks. 
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Fig. 8. The single-user case of the static request model. 


In this section we consider the static request model, which corresponds to the case where 
k = 1 (see Figure [8]). Alternatively, one can think of it as processing 

X[i-i)F+l-i X(i-l)F+2-i • • • , X{i-1)F+F 

for different i G [k] separately. We will refer to it as coding within block. We remark that both 
coding across blocks and coding within block are special cases of coding over multiple instances, 
i.e., coding over the whole source sequence 

We can draw an analogy between the request models for content delivery networks and the 
fading models for wireless networks. If we think of the requests as channel states, then the 
requests behave like fast fading in the information-theoretic model. On the other hand, in the 
static request model, the requests behave like quasi-static fading. 

For convenience, we assume that y = [N], where is a positive integer. In the static request 
model, the cache encoder observes the source sequence X^, the update encoder observes the 
source sequence X^ and the single request Yi, and the decoder only observes the request kil^ 
The cache encoder generates a message G [2^^=^], and the update encoder generates a message 
MuiY) G The decoder receives {M^, MuiY)) and wishes to recover the sequence of 

functions {f{Xj,Y)}j^]^F] losslessly. Note that the length of the update message is generally a 
function of Y. 

A F") distributed multiple description code consists of 

• one cache encoder, which assigns an index G [2'^^^] to each sequence G 

• one update encoder, which assigns N indices mu{x^,y) G [2^^’^^*'^], y G [N], to each 
sequence x^ G 

• one decoder, which assigns an estimate to each tuple {mc,rnu,y). 


^For notational convenience, we drop the subscript 1 in Yi hereafter. 
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In words, there are N codebooks at the update encoder and the decoder, each of which is served 
for one request y G [N]. On the other hand, since the cache encoder does not observe the request 
Y and cannot infer anything about Y from the observed source sequence X^, the cache encoder 
only needs one codebook. 

For the static request model, a rate tuple (i?c, {Ruiy)}ye[N]) is said to be achievable if there 
exists a sequence of F) codes such that 


lim P 

F^OO 



Wy E [iV], 


( 6 ) 


The optimal rate region T^static i® closure of the set of achievable rate pairs. 

The above problem can be considered as an iV-user Gray-Wyner system: Each user y E [iV] 
receives a private message Mu{y) and a common message Me. The common message is received 
by all users. Each user y E [iV] wishes to recover the sequence {f{Xj,y)}j(z[F\- Then, we have 
the following theorem. 

Theorem 5: Consider the single-user case under the static request model. The optimal rate 
region 7?.*tatic is the set of rate tuples (i?c, {Riiiy)}y€[N]) such that 


R,>I{X;V), 

K{y)>H{f{X,y)\V), Wy E [N], 


for some conditional pmf pv\x with |V| < \X\ + N. 

The static request model can be studied under more specific performance criteria, which relate 
the request-dependent update rates {i?u(|/)} to a fixed quantity R^, which is independent of the 
realization of Y. In the following, we discuss two common performance criteria by analogy with 
quasi-static fading. Eor convenience, we denote 


= {R : (r,R) G < R^}, 


where R = (Ru(l), -Ru(2), • • • , Ru{N)). 


A. Compound (The Worst Case) 

Consider a fixed update rate Ru- The compound formulation requires that for each block of 
length F, the update message cannot contain more than [FRu) bits. Namely, it requires that for 
all y E [iV], 


Ruiy) < Ru 
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We remark that this setup is equivalent to the worst-case formulation considered by Maddah-Ali 
and Niesen in m. Thus, the compound-optimal update rate given the cache rate Rc can be 
defined as 

^compound(^c) := Klin maxRu{y). 

l{.e'R*{Rc)ye[N] 

Then, from Theorem [51 we have the following corollary. 

Corollary 4: Consider the compound formulation of the static request model. The compound- 
optimal update rate given the cache rate R^ can be expressed as 

^compound (^c) = min m{ixH{f{X,y)\V), 

Pv\x y&[N] 
s.t. i(xy)<Rc 

where |V| < \X\ + N. 

The compound formulation aims to model the worst-case scenario in which the request 
statistics is not known and/or the communication resource cannot be redistributed over blocks. 
This formulation is also studied in 071 . Finally, we remark that one may relax the compound for¬ 
mulation by tolerating some outage events, which requires higher update rates than a predefined 
value. More details of the outage formulation can be found in OTl Chapter 3.7.2]. 

B. Adaptive Coding (The Average Case) 

Different from the compound formulation, here the unused communication resource can be 
saved for the other blocks. The only requirement is that the average number of bits per block 
cannot be larger than . This setup is equivalent to the average-case formulation considered 

by Niesen and Maddah-Ali in El- To see the relation of the adaptive coding formulation with 
the information-theoretic model, let us consider the whole rate region instead. We define the 
adaptive-optimal rate region as 

"^adaptive := { (-Rc, Ey [7?^ (^)]) ^ (-Rc, {-Ru(//)}ye[Af]) ^ "R*} • 

From Theorem [5] we have the following corollary. 

Corollary 5: Consider the adaptive coding formulation of the static request model. The 
adaptive-optimal rate region /^-adaptive is the set of rate pairs (i?c, -Ru) such that 


R,>I(X-V), 

(7) 

R,>H{f{X,Y)\V,Y), 

(8) 
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Y^ 


Fig. 9. The system considered in Example [2 The information-theoretic model corresponds to the case where F = 1 and the 
static request model corresponds to the case where k = 1. 

for some conditional pmf pv\x with |V| < \X\ + 1. 

Proof: It suffices to show that Ey[ 0 (F)] = H{f{X, Y)\V, Y), where = H{f{X, y)\V). 
Indeed, we have 

N 

EvMY)] = 5 ; PY{y)f{y) 

y=l 

N 

= J2j’Yy)n{Hx,y)\v) 

y=l 

N 

= Y.J’y^y'>^(SX,v)\v,Y = y) 

y=l 

= H{f{X,Y)YY), 

where (a) follows since X is independent of Y, by assumption, and V X Y form a 
Markov chain. Finally, we remark that the cardinality bound on V is refined from \X\ + N to 
I A" I + 1, which can be proved using the convex cover method [[^ Appendix C]. ■ 

For the single-user caching problem. Corollary [5] thus shows that the rate region for the 
information-theoretic model (Theorem [U) takes exactly the same shape as the rate region for the 
average-case of the Maddah-Ali-Niesen model (Equations (|7]) and ([ 8 ])) in spite of the fact that 
the modeling assumptions are rather different. Beyond the single-user case, this equivalence of 
rate regions continues to hold at least for the extended Gray-Wyner systems studied in Section 
IIV-AI which can be established by arguments along the lines of the proof of Corollary [51 In more 
general multi-user cases, however, the rate region of the information-theoretic model is different 
from the rate region of the Maddah-Ali-Niesen model, as the following example illustrates: 

Example 2: Consider the system depicted in Figure |9l The system is a special case of sequential 
successive refinement, discussed in Section IIV-BI in which Rc,{i, 2 } = Ru,{ 2 } = 0. We assume 
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that X ~ Bernoulli(l/2), V ~ Uniform({l, 2}), fi(X, V) = X ■ 1{F = 1}, and / 2 (X, V) = X, 
where !{■} is the indieator funetion. We observe that if F = 1, then the eaehe eontent at User 
2 is useless beeause the update eneoder has to send X to User 1 anyway. 

Following from Theorem |4l the optimal rate region under the information-theoretic model, 
denoted by 7Z*^, is the the set of rate pairs (i?c,{ 2 },-Ru,{i, 2 }) such that 

i?c,{2 }>/(X;U|/i(x,y),y), 

Ru,{i,2} > H{MX, Y)\Y) + HiMX, U)|U, MX, Y),Y), 

for some conditional pmf pv\x- After specializing to the considered instance, we have the 
following closed-form expression: 

Rc,{2} > ^(0 + r) = 

-Ru,{i,2 } ^ 2 

for some r > 0. 

On the other hand, although a single-letter characterization for the optimal rate region under the 
static request model, denoted by TZM unknown in general, we have a closed-form expression 
for the considered instance: 7^*2 is the set of rate tuples (i?c,{ 2 },-Ru,{i, 2 }(l),-Ru,{i, 2 }( 2 )) such 
that 

Rc,{ 2 } > max{0,r} = r, 

-Ru,{1 ,2}(1) ^ 1, 

fiu,{i, 2 }( 2 )>(l-r) + , 

for some r > 0. 

For the achievability, we borrow the coding insights gained from the information-theoretic 
model, in which Decoder 2 decodes fi{X, Y) first and uses it as side information. For the static 
request model all rate tuples (i?c,{ 2 },-Ru,{i, 2 }(l),-Ru,{i, 2 }( 2 )) satisfying 

/?c.{ 2 }> inax /(X;U|/i(X, 2 /)), 

RuAP 2 }{y)>H{h{X,y)) + H{MX,y)\V,h{X,y)), t/G {1,2}, 

for some conditional pmf pv\x are achievable. Then, the achievability follows by setting V = 
(XQ,Q), where Q ~ Bernoulli(min{r, 1}) is independent of {X,Y). 
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To see the optimality, we note that if F = 1, then Deeoder 1 must be able to reeover 
losslessly from Mu {1^2} (1) and thus we have -Ru,{i,2}(l) > 1. If F = 2, then Deeoder 2 must be 
able to reeover losslessly from Mc_{ 2 } and Mu.{ 1 , 2 }(2), which gives the sum rate constraint 

-Rc,{2} + -Ru,{1 ,2}(2) > 1. 

Therefore, the corresponding adaptive-optimal rate region is the set of rate pairs (-Rc,{2}, -Ru,{i,2}) 
such that 


^c,{2} > r, 

-Ru,{i,2} ^ 2 

for some r > 0 . 0 

The above example shows that coding across blocks can be more beneficial than coding within 
block and we conjecture that it is always the case. Furthermore, we conjecture that whenever 
coding across blocks is feasible, coding within block provides no additional gain. 

VI. Conclusion 

In this paper, we have formulated the caching problem as a multi-terminal source coding 
problem with side information. The key observation is that we can treat the requested data as 
a function of the whole data and the request. All forms of data and requests can be simply 
modeled by random variables X and F, respectively, and their relation can be simply described 
by a function /. Thanks to the formulation, many coding techniques and insights can be directly 
borrowed from the well-developed source coding literature. For the single-user case, we have 
given a single-letter characterization of the optimal rate region and found closed-form expressions 
for two interesting cases. Then, using the insights gained from the Gray-Wyner system and the 
successive refinement problem, we were able to find single-letter expressions of optimal rate 
region for three two-user scenarios. Finally, we showed that although the information-theoretic 
model and the static request model have a similar behavior for the single-user case, they in 
general behave differently in multi-user scenarios. 

Appendix A 
Proof of Theorem [H 

(Achievability.) The proof follows from standard random coding arguments as in the problem 
of lossless source coding with a helper. Here we provide a high-level description of the coding 
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scheme. First, the cache encoder applies Wyner-Ziv coding on given side information so 
that the decoder learns v^, a quantized version of x^. Then the update encoder applies Slepian- 
Wolf coding on the function sequence {f{xi,yi) : i G [k]) given side information {v^,y^). 
(Converse.) Denote Si = f{Xi,Yi}, i e [k]. First, we have 

kR^ > H{Mc) 

> H{Mc\Y^) 

= I{X^;M^\Y^) 

k 

i=l 

/(X,; Me, X*-\ 

i=l 

k 

> Y,Hx^;Vi\Yi), 

i=l 

where (a) follows since {Xi,Yi) is independent of (X*“^, For the last step, we define 

Vi := (Me, S^~^, Note that Vi —o— X* —o— Yi form a Markov chain. 

Next, we have 

kR^ > H(M^) 

> H{Mu\M,,Y^) 

= /^(^^M,|Me,X^)-M(^^|Me,M„X^) 

> H{S'^\M,,Y^)-kek 

k 

= Y,H{S,\M,,S^-\Y>^)-kek 

i=l 

k 

= Y,H{S,\Vi,Y^-kek, 

i=l 

where (a) follows from the data processing inequality and Fano’s inequality. The rest of the 
proof follows from the standard time-sharing argument and then letting k ^ oo. The cardinality 
bound on V can be proved using the convex cover method [1^ Appendix C]. 

Remark 2: First, the converse can also be established by identifying the auxiliary random 
variable Vi = (Me, X*“\ Second, the optimal rate region TZ* is convex since the 

auxiliary random variable V performs memory sharing implicitly. Finally, as can be seen from 


25 


the achievability, even if the update eneoder is restrieted to aeeess only the funetion sequenee 
(/(xj,j/j) : i e [k]), instead of the rate region remains the same. 0 


Appendix B 

Proof of Corollary □ 

Statements 1 and 2 are straightforward. Statement 3 ean be proved by a simple cut-set 
argument. Alternatively, from Theorem [U we observe that 

R, + R,> I{X- C|F) + H{f{X, Y)\V, Y) 

> /(/(X, F); F|F) + H{f{X, F)|F, F) 

= Hif{X,Y)\Y). 

Then, the lower bound on sum rate implies R* > H(f{X, F)|F) = R*, i.e.. Statement 4. 

Now we prove Statement 5. Assume that {Rc,Ru) e 71* and fix any a e [0,i?c]- The case 
where i?c = 0 is trivial. Next, we consider the case where Rc > 0. Time sharing between (0, i?*) 
and (i?c, Ru) asserts that 

R, -a,R, + ) G 


Rc 


Then, Statements 1 and 3 imply that 


R* - Ru 
Rc 


< 1 , 


so it holds that {R^ — a, Ru + a) G 71*. 

Finally, we prove Statement 6. Assume that (i?c + a, Ru — a) G 7Z*. Then, memory sharing 

between (0, i?*) and (R^ + a, Ru — a) asserts that 

^ Q.Ru + RcjRu — q) j ^ 

Rr + a 


However, Statement 3 implies that 


QiR^ T Rc(^Ru — Cl) 


< Ru, 


Rq T Oj 

which contradicts the assumption that (R^, Ru) is an extreme point. 
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Appendix C 

Proof of Proposition [H (Converse) 

Suppose that (i?c, Ru) £ R*- Then, there exists a eonditional pmf pv\x sueh that Rc > 
/(X; V\Y) =: r and R^ > H{f{X, Y)\V, Y). For all n e [N], we have 

r = I{X-,V\Y) 

= /(X; V) 

> /(XW,--- ,X(’^);1/) 

n n 

> Y,H{X^^'>)-Y.H{X^^^\V), (9) 

i=i j=i 

where (a) follows sinee X and Y are independent. Next we show that Ru ean be lower bounded 
as in dl]): 


Ru>H{f{X,Y)\V,Y) 

N 

i=i 

(a) / ^ \ ^ 

>py(X) +J2pyU)H{X^^^V) 

\j=i j=i ) i=i 

(fe) ( ^ 

>PrW +5^(py(j)-py(/V))i/(A'W|\/), (10) 

Vj=i / j=i 

where (a) follows from dH) with n = N and H{X^^^\V) > 0 and (b) follows sinee {u — u)+ > 
(«)■•■ — V for all u > 0. The term on the right-hand side of (fTOl) ean be lower bounded as 


N-l 


J2(pyU)-pyW)f^(x^’''Y) 


i=i 

> (py(X-l)-py(X)) 


( N-l N-2 \ 

i=i i=i / 


N-2 

+ J2{Py{j)-Py{N))H{X^^^\V) 
i=i 


N-l 

H{X^R) - r 


N-2 

+ Y.^PyU)-Vy{N-1))H{X^^\V), 
i=i 


> (py(X-l)-py(X)) 
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where (a) follows from dH) with n = AT - 1 and H{X^^-^^\V) > 0. At this point, it is elear 
that we ean apply the same argument for another N — 2 times and arrive at 


N 


Ru > ^(py(n) - prin + 1)) 


( 11 ) 


n=l 


vj=l 


where py{N + 1) = 0. 


Appendix D 

Proof of Proposition [2] (Converse) 

Suppose that (i?c, Ru) G TZ*. Then, there exists a eonditional pmf pv\x sueh that Rc > 
I{X; V\Y) =: r and R^ > H{f{X, Y)\V, Y). For all n e [N], we have 

r = I{X;V\Y) 

= J(X; V) 

> J(XW,-- - 

n 

= H{X^^'>) -J2h{XR^\V,X^^-^^), (12) 

i=i 

where (a) follows sinee X and Y are independent and (6) follows from the assumption that 
//(XWlA^'^+i)) = 0 for all n e [A^ - 1]. Next, we show that Ru can be lower bounded as in 

Ru > Hif{X,Y)\V,Y) 

N 

= J2pr{n)H(X>''>\V) 

71=1 

N 

n=l 

N n 

n=l j=l 

N / N \ 

7=1 \n=j / 
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where (a) and (b) follow from the assumption that = 0 for all n G [A^ — 1]. 

For notational eonvenienee, let us denote Sj = J2n=jPY{'^) qj = H{X^^'>\V,X^^~^'>). Then, 
we have 

N 

Ru > SjQj 

i=i 

(а) ( \ ^ 

>Sn in 

V j=i / i=i 

( б ) 

> Sn {H{X^^^) - r)^ + - SN)qj 

j=i 

Af-l 

= Py{N) {H{X^^^) - r)^ + ^(Sj - SN)qj 

i=i 

(c) ( 

> Py{N) {H{X^^^) - r)^ + {sN-i - sn) H{X^^-^^) 

\ i=i 

N-2 

+ - ^N)qj 

i=i 

(d) 

> py{N) (i7(X(^)) - r)+ + {sN-i - Sn) (HiX^^-^^) - r)+ + - s^_i)g, 

i=i 

N N-2 

= Y1 - SN-i)qj, 

n=N—l j=l 

where (a) and (c) follow from (fT^ and H{X^^'>\V, X^^~^'>) > 0 with n = N and n = N — 1, 
respeetively, and (b) and (d) follow sinee {u — u)’*' > («)■*■ — v for all u > 0. At this point, it is 
clear that we can apply the same argument for another N — 2 times and arrive at 

N 

Ru>J2PYin){H{X^^'>)-r)^. (13) 

n=l 

Appendix E 

Proof of Proposition [3] 

Denote by TZ the set of rate pairs {Rc, Ru) such that 

R,>I(X-,V\Y), 

R,>H{f{X,Y)\V,Y), 
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for some conditional pmf pv\x- Since I{X;V\Y) > I{X]V\Y), we have TZ* C TZ. Also, it 
can be shown that the rate region 7Z is achievable, so we conclude that TZ* = TZ. By using the 
assumptions that X and Y are independent and that Y is uniformly distributed, we can simplify 
the rate expressions as 


R,>I{X-V), 


N 




n=l 


Now denote by Py\-x the conditional pmf induced by the conditional pmf pv\x- As can be 
checked, both I{X-,V) and {H{X‘^^^V)}n^ [AT] can be completely determined by the induced 
conditional pmf Py^x- Thus, it suffices to consider the space of all conditional pmfs Py^x- 
Finally, noting that 


N 


J2h{X^^^V) = H{X)-I{X-,V) + T{X\V), 


n=l 


it holds that if Rc = r G [0, i/(X)], then 


min{i?u|i?c = r, {R^, R^) G TZ*} = — 


( 


E{X)-rY min r(X|V^) 

s.t. i(xy)=r 


Appendix F 

Proof of Theorem [4] (Converse) 

Denote Sy = fi{Xi,Yi} and 821 = f 2 {Xi,Yi) for i G [k]. The rates -Rc,{i, 2 } and -Ru,{i, 2 } 

can be lower bounded in the same manner as the single-user case and thus the details are 
omitted. Denote Vy = (Mc,{i,2}, Nqw consider the bounds on i?c,{i,2} + Rc,{ 2 } 

and i?u,{i,2} + -Ru,{2}- First, we have 

k{Rc,{l,2} + .Rc,{ 2 }) 

k k 

= HSu; A4,,u,|s;->, yp + a(V; yyyx‘-\ sf, y*) 


i=l 


i=l 
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^ I{Su, Sl-\ 

i=l 

k 

+ Y, liXf, i\4.,i,2), s;-*, Afe,, 2 ,, x‘-\ sfi+iiSii, F) 

i=l 

k k 


i=l 


z=l 


where (a) follows since (Xj, V^, Su) is independent of por the last step, 

we define ¥21 := (Mc^{ 2 }, X*“^, Note that (14i,V2j) —o— X* —o— Yi form a Markov 

chain. Second, we have 

k{Ru,{l,2} + -Ru,{2}) 

> i/( 5 f,M,,{i, 2 }|Mc,{i, 2 },X'^) + i/(M,,{ 2 }) - fee', 

k 

= H{Su\M,,{ 1 , 2 }, Y^) + i/(M,,{l,2 }|^^, M,,{i, 2}, >"") + H{M,^{ 2 }) - ke’, 

i=l 

k 

> Y, H{Su\V,,, X,) + i/(M,,{i,2}, M,,|2}|^f, Me,{1,2}, Me,{2}, Y'^) - ke', 

i=l 

(b) jl 

> Y H{Su\V,i, X.) + H{S^, M,,{i, 2}, M,,{2}|^f, Me,{i,2}, Me,{2}, x^) - fc(e' + e") 


2=1 

k 


> Y H{Su\Ki, F) + A4,,i,2,, Afc,(2). y‘) - ‘'(ei + 4') 

2=1 

k k 

= Y tl(Su\V,i, F) + ^ ff(S2i|Sr‘, Sf, AV„.„, Af,.,,), F‘) - k(e', + ej) 


2=1 

/C 


2=1 

A: 


> Yi) + M(.S2*|.Si*, X2„ Xc„ Yi) - kie', + e"), 


2=1 


2=1 


where (a) and (6) follow from the data processing inequality and Fano’s inequality. The rest of 
the proof follows from the standard time-sharing argument, letting fc — )■ oo, and the fact that 

/(/i(X, X); Xc|X) + /(X; K, X2|/i(X, X), X) 

= J(X; K|>^) + I{X; X2|/i(X, X), K, 

The cardinality bounds on Vc and V 2 can be proved using the convex cover method ll^ Appendix 
C]. 
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